60 research outputs found
Analyzing large-scale DNA Sequences on Multi-core Architectures
Rapid analysis of DNA sequences is important in preventing the evolution of
different viruses and bacteria during an early phase, early diagnosis of
genetic predispositions to certain diseases (cancer, cardiovascular diseases),
and in DNA forensics. However, real-world DNA sequences may comprise several
Gigabytes and the process of DNA analysis demands adequate computational
resources to be completed within a reasonable time. In this paper we present a
scalable approach for parallel DNA analysis that is based on Finite Automata,
and which is suitable for analyzing very large DNA segments. We evaluate our
approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog
(2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results
on a dual-socket shared-memory system with 24 physical cores show speed-ups of
up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel
approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and
Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201
HSTREAM: A directive-based language extension for heterogeneous stream computing
Big data streaming applications require utilization of heterogeneous parallel
computing systems, which may comprise multiple multi-core CPUs and many-core
accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such
systems require advanced knowledge of several hardware architectures and
device-specific programming models, including OpenMP and CUDA. In this paper,
we present HSTREAM, a compiler directive-based language extension to support
programming stream computing applications for heterogeneous parallel computing
systems. HSTREAM source-to-source compiler aims to increase the programming
productivity by enabling programmers to annotate the parallel regions for
heterogeneous execution and generate target specific code. The HSTREAM runtime
automatically distributes the workload across CPUs and accelerating devices. We
demonstrate the usefulness of HSTREAM language extension with various
applications from the STREAM benchmark. Experimental evaluation results show
that HSTREAM can keep the same programming simplicity as OpenMP, and the
generated code can deliver performance beyond what CPUs-only and GPUs-only
executions can deliver.Comment: Preprint, 21st IEEE International Conference on Computational Science
and Engineering (CSE 2018
The Potential of the Intel Xeon Phi for Supervised Deep Learning
Supervised learning of Convolutional Neural Networks (CNNs), also known as
supervised Deep Learning, is a computationally demanding process. To find the
most suitable parameters of a network for a given application, numerous
training sessions are required. Therefore, reducing the training time per
session is essential to fully utilize CNNs in practice. While numerous research
groups have addressed the training of CNNs using GPUs, so far not much
attention has been paid to the Intel Xeon Phi coprocessor. In this paper we
investigate empirically and theoretically the potential of the Intel Xeon Phi
for supervised learning of CNNs. We design and implement a parallelization
scheme named CHAOS that exploits both the thread- and SIMD-parallelism of the
coprocessor. Our approach is evaluated on the Intel Xeon Phi 7120P using the
MNIST dataset of handwritten digits for various thread counts and CNN
architectures. Results show a 103.5x speed up when training our large network
for 15 epochs using 244 threads, compared to one thread on the coprocessor.
Moreover, we develop a performance model and use it to assess our
implementation and answer what-if questions.Comment: The 17th IEEE International Conference on High Performance Computing
and Communications (HPCC 2015), Aug. 24 - 26, 2015, New York, US
Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution
While modern parallel computing systems provide high performance resources,
utilizing them to the highest extent requires advanced programming expertise.
Programming for parallel computing systems is much more difficult than
programming for sequential systems. OpenMP is an extension of C++ programming
language that enables to express parallelism using compiler directives. While
OpenMP alleviates parallel programming by reducing the lines of code that the
programmer needs to write, deciding how and when to use these compiler
directives is up to the programmer. Novice programmers may make mistakes that
may lead to performance degradation or unexpected program behavior. Cognitive
computing has shown impressive results in various domains, such as health or
marketing. In this paper, we describe the use of IBM Watson cognitive system
for education of novice parallel programmers. Using the dialogue service of the
IBM Watson we have developed a solution that assists the programmer in avoiding
common OpenMP mistakes. To evaluate our approach we have conducted a survey
with a number of novice parallel programmers at the Linnaeus University, and
obtained encouraging results with respect to usefulness of our approach
PaREM: A Novel Approach for Parallel Regular Expression Matching
Regular expression matching is essential for many applications, such as
finding patterns in text, exploring substrings in large DNA sequences, or
lexical analysis. However, sequential regular expression matching may be
time-prohibitive for large problem sizes. In this paper, we describe a novel
algorithm for parallel regular expression matching via deterministic finite
automata. Furthermore, we present our tool PaREM that accepts regular
expressions and finite automata as input and automatically generates the
corresponding code for our algorithm that is amenable for parallel execution on
shared-memory systems. We evaluate our parallel algorithm empirically by
comparing it with a commonly used algorithm for sequential regular expression
matching. Experiments on a dual-socket shared-memory system with 24 physical
cores show speed-ups of up to 21x for 48 threads.Comment: CSE-2014, Dec. 19th - 21st, 2014, Chengdu, Sichuan, Chin
- …